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Abstract 

The fail-stop failure model appears frequently in the distributed systems literature. How- 
ever, in an asynchronous distributed system, the fail-stop model cannot be implemented. In 
particular, it is impossible to reliably detect crash failures in an asynchronous system. 

In this paper, we show that it is possible to specify and implement a failure model that is 
indistinguishable from the fail-stop model from the point of view of any process within an 
asynchronous system. We give necessary conditions for a failure model to be indistinguishable 
from the fail-stop model, and derive lower bounds on the amount of process replication needed 
to implement such a failure model. We present a simple one-round protocol for implementing 
one such failure model, which we call simulated fail-stop. 


1 Introduction 

The fail-stop failure model appears frequently in the distributed systems literature. The fail-stop 
model makes two assumptions about the failure behavior of processes: processes fail only by 
permanently crashing, and when a process crashes, surviving processes will eventually detect 
that failure. The fail-stop model is appealing because it makes distributed algorithms easier to 
formulate: fail-stop failures are easy to tolerate. 

For example, suppose that a set of processes { 1 , 2, n} wish to solve the election problem: at any 
point, no more than one process of the set can be the leader, and as long as all processes do not fail, 
it is always the case that there will eventually be a leader. Assuming a fail-stop failure model leads 
to a very simple solution. Each process maintains a local copy of the list (1,2, n), and the first 
element of this list denotes the leader. When process i detects the failure of process j, i removes j 
froms its local copy of the list. When i finds itself the first element of its list, i knows that it is the 
leader. Since a process becomes the head of the list only when all lower-numbered processes have 
foiled, there is no more than one leader at any time; and, as long as a process eventually detects 
the failure of the lower-numbered processes, it will eventually become the leader. 

'This work was supported by the Defense Advanced Research Projects Agency (DoD) under NASA Ames grant 
number NAG 2-593, and by grants from IBM and Siemens. The views, opinions, and findings contained in this report 
are those of the authors and should not be construed as an official Department of Defense position, policy, or decision. 

'This author is also supported by an AT&T PhD Scholarship. 
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A serious limitation of assuming a fail-stop failure model is that it is often an unrealistic 
assumption. In particular, in an asynchronous distributed system (i.e., a system with no shared 
memory, arbitrary message delivery times, no global dock, and arbitrary process speeds), the 
fail-stop model cannot be implemented. This is because it is impossible to reliably detect crash 
failures in an asynchronous system (see Theorem 1). 

On the other hand, there are systems (e.g., ISIS [BJ87]) that provide crash-failure detection with- 
out making synchrony assumptions. They do this by allowing failures to be detected erroneously, 
e.g., by using timeouts and gossip messages ([RB91]) to attain agreement among a set of processes 
that a process p has failed even though that process p may not have crashed. Hence, they provide 
a failure model that resembles fail-stop in some ways but is not strictly fail-stop. 

In this paper, we present a failure model, called simulated fail-stop, that is internally indistin- 
guishable from fail-stop, meaning that under this model, no process in the system can determine 
that it is not running in a system in which the fail-stop assumption holds. We give a set of con- 
ditions that are necessary in order for any model to be indistinguishable from fail-stop, and we 
prove that simulated fail-stop is indistinguishable from fail-stop. We give lower bounds on the 
number of processes needed for a one-round implementation of the simulated fail-stop model to 
tolerate t failures, and show that these bounds hold for any model that is indistinguishable from 
fail-stop. Finally, we show that the bounds are tight by giving a protocol that attains them. 

The paper is organized as follows. Section 2 describes the system model used throughout 
the paper, including notation, definitions, and a formal logic used to describe system properties. 
Section 3 specifies the fail-stop and simulated fail-stop models, introduces the notion of indistin- 
guishability of failure models, and proves that certain conditions are necessary and/or sufficient 
for a f ailu re model to be indistinguishable from fail-stop. Section 4 gives lower bounds on the 
number of processes needed to tolerate t failures for one-round failure detection protocols imple- 
menting the simulated fail-stop model, and shows that these bounds hold for any model that is 
indistinguishable from fail-stop. Section 5 shows that these lower bounds are tight by presenting 
a protocol that meets them. Section 6 concludes the paper and discusses the work that remains to 
be done on this topic. 


2 System Model 

We consider a distributed system consisting of a set of n processes P = {1, 2, . . . , n}. A process 
fails by simply stopping execution (crashing), and a failed process does not recover. The system 
is asynchronous, meaning that the rate of execution of any process with respect to any other is 
unbounded and there are no physical clocks. Between any two processes i and j there exist two 
unidirectional FIFO channels: C t J from i to j and C Jt , from j to i. Processes communicate only 
by sending and receiving messages over these channels. The channels are nonfaulty: they do 
not lose, generate, or garble messages. Message delivery time is unbounded. We assume for 
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simplicity that channels have infinite buffers and that all messages m are unique (they can easily 
be made so by including in m its source and a sequence number). The state of a channel is the 
sequence of messages that have been sent along the channel but not received along the channel. 

A process is defined by a set of states, one of which is denoted the initial state. The state of 
a process i consists of the values of all internal variables of the process, plus the values of n + 1 
additional boolean variables that are defined as follows: 

• crash, . This variable is initially false and can become true at any time. Once crash, becomes 
true, the state of i does not change further. (This models the failure of i.) 

• Vj € P : failed, (j). This variable is initially false for all values of j, and becomes true when i 
detects the crash of process j. Once failed, ( j) becomes true, it remains true forever. Exactly 
when failed, (j) becomes true with respect to when crash, becomes true is discussed in this 
paper. 

A global state of the system is a set of process and channel states. An initial global state is the 
global state in which each process state is an initial state and each channel state is the empty 
sequence. 

An event e is a function that maps global states to global states. An event e applied to a global 
state E yields a new global state E' that differs from E in the local state of exactly one process i 
and the state of at most one channel incident on i. We say in this case that e is an event of i, and 
that e changes the state of *. 

If an event e of process i changes the state of C, 0 for some j, then we call e a send event. A send 
event changes the state of a channel by appending a message m to the sequence of messages on 
that channel. If e changes the state of Cjj for some j, then we call e a receive event. A receive event 
changes the state of a channel by removing a message from the head of the sequence of messages 
on that channel. 

We define events, rims, and predicates formally in Appendix A.l. Informally, send, receive, 
crash, and failure detection events are defined as follows: 

• sendi(j, m ) denotes the event whereby process i sends the message m to process j. 

• recvi(j, m) denotes the event whereby process i receives the message m from process j. 

• crash denotes the event whereby crash, becomes true. 

• failed^ j) denotes the event whereby failed, (j) becomes true. 

Definition 1 A run of the system is an infinite sequence of global states of the system: r = (Eo, Si, S 2 , . . .), 
where Eo is an initial global state and there exists a sequence of events (eo, ei , e 2 , . . .) such that for all i > 0, 
Ei+i = e t(E,). 
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Definition 2 Given any run r = (So, Si, S 2 , . . .), the history ofr, denoted H r , is the sequence of events 
(e 0 , , « 2 , • • •) such that for all i > 0, E,- +1 = e,(SJ. 

Note that for any run r, ft T is uniquely determined. Furthermore, r can be constructed from a 
history H r and the initial global state So- 

Throughout this paper, we use the notation H r = ( ••• e, • • • e_, • • • • • •). This denotes that H r 

is of the form ( 1 ; e,; y; ej; z ; e k ; w), where e % , e,, and e k are events, x, y, and 2 are finite sequences of 
events, and w is an infinite sequence of events. 

We specify properties of systems using predicate logic over global states and linear-time 
temporal logic over (infinite) suffixes of runs [Pne77]. We define the boolean predicates SEND, ( j, m ) 
and RECV,( j, m) as follows. 

• Vi, j, m: SEND, (j, m) and RECV,(j, m) are false in an initial global state. 

• sendi(j , m)( E) f= SEND,(j, m). That is, SEND,(ji, m) becomes true when sendi(j, m) has oc- 
curred. 

• recvi(j, m)(K) \= RECV,( j, m). That is, RECV,(j, m) becomes true when rccvi(j,m ) has oc- 
curred. 

Furthermore, both SEND,(j, m) and RECV,(j, m) are stable by definition: once such a predicate 
becomes true in a run, it remains true for the remainder of the run. ([CL85]) 

We define the boolean predicates CRASH, and failed,(j) as follows. Let E be a global state. 

• E |= CRASH, if and only if crash,- is true in E. 

• Vj: E (= FAILED,- (j) if and only if failed, -(j) is true in E. 

Note that both CRASH, and FAILED, ( j) are stable by assumption: once these local variables become 
true in the local state of i, they remain true thereafter. 

Let s = (E 0 ,Ei,E 2 ,...)bea suffix of a run, let <p be a predicate, and let V be a temporal logic 
formula. 

• (*, k) f= ip iff E* (= <p. 

• (a,*)|= 0^iff3j> k: (s,j) \= V 

• (s, k) UV iff Vj > k : ( s,j)\=V 

Furthermore, we abbreviate (r,0) \=V asr \=V. 

We define the failed-before relation as follows: 

Definition 3 l/r)= 0 FAILED j(i) in some run r, we say that i failed before j in r. 
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Note that it is possible that both CRASH, and CRASH j hold in some global state yet neither i failed 
before j nor j failed before i. 

We use a version of the happens before relation of [Lam78]. Given two events e\ and ti, define 
— • e -2 (read "ej happens before ei") in some history H T if one of the three following conditions 
holds: 

1. t\ and e 2 are of the same process, and either e\ = ej or ei precedes c -2 in V . T ; 

2. e\ = sendi(j, m ) for some value of i, j, and m, and e 2 = recvj(i, m ); 

3. there exists an event e such that e\ — *■ e and e — ez- 

The happens-before relation as defined here is the same as that given in [Lam78], except that 
our relation is reflexive. This is for notational convenience. Note that for all e\ e 2 , e\ e 2 
implies that ei precedes e 2 in H r . The converse does not hold, however. 

Let r be a run. Let r, be the sequence of states of i in r, with repeated states removed (i.e., so 
that adjacent states are distinct). If x and y are runs, then we say that run x is isomorphic to run y 
with respect to process i, denoted x =, y, if and only if x,- = j/{. In other words, x =i y if and only 
if runs x and y are indistinguishable to process i. Similarly, tq for Q C P is the sequence of states 
of processes j e Q in r with repeated states removed, and x =q y if and only if xq = yQ. (See 
[CM86] for a detailed discussion of the ramifications of indistinguishability of runs.) 

3 Specification of Failure Models 

A failure model describes the manner in which the components of a system can fail. For our 
purposes, a failure model constrains how crash events and failed events can occur with respect to 
each other. We give these constraints as a set of properties and define the failure model as the set 
of runs that satisfy these properties. 

3.1 The Fail-Stop Failure Model 

The minimal set of fail-stop assumptions found in the literature is that in any infinite run of the 
system, a process's failure is eventually detected by all processes that don't crash, and that there 
are no false detections of failure. These two conditions specify the failure model defined in [Sch84], 
Hence, we adopt this as the definition of th e fail-stop failure model. 

Formally, the two fail-stop conditions are: 

FS1: Vr, i: r [= □ (CRASH j => Vj: 0(CRASHj V FAILEDj(i))) 

FS2: Vr, i,j: r (= □( FAILED j(i)=>CRASHj) 

We denote with FS the set of runs satisfying properties FS1 and FS2. 
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Theorem 1 In an asynchronous system in which crash failures are possible, properties FS1 and FS2 are 
impossible to implement. 

Proof : In [CT91], an algorithm is given for solving Consensus with a Strong Failure Detector. A 
Strong Failure Detector is shown to be strictly weaker than a Perfect Failure Detector, implying 
that a Perfect Failure Detector can also be used to solve Consensus. A solution to Consensus 
contradicts the result of [FLP85]; therefore, a Perfect Failure Detector cannot be constructed. 

A Perfect Failure Detector is defined in [CT91] as a failure detector satisfying Strong Com- 
pleteness and Strong Accuracy. These two properties are identical to FS1 and FS2. Therefore, 
implementing FS is equivalent to implementing a Perfect Failure Detector, and is therefore im- 
possible. □ 

3 2 Indistinguishable Failure Models 

A process determines which event to execute based on its state and the messages that it has 
received. A run r is isomorphic to a run r' with respect to a process i if i executes the same events 
in both r and r'. We know that the two runs are isomorphic with respect to i if i starts in the same 
initial state in both runs, receives the same messages in the same order in both runs, and makes 
the same nondeterministic choices (if any) in both runs. Consider a run r of a system. If r is not 
in FS but is isomorphic with respect to i to a run t' in FS, then the events i executes are the same 
as if it were running in a system satisfying the fail-stop assumptions. Hence, if r =p r’, then no 
process in P can determine that r is not in FS. 

Definition 4 A failure model M is indistinguishable from the fail-stop model if for any run r e M, there 
exists a run r' e FS such that r =p r' (that is, r is indistinguishable from r' to every process in P). 

Consider the election protocol described in Section 1. If a run of this protocol is in a failure 
model M that is indistinguishable from, but not identical to FS, then there may be more than one 
leader in some global state, but no process will be able to determine this. Thus, internally the 
execution is the same as if there were only one leader at a time. 

Recall that the reason that FS can not be implemented in an asynchronous system is because 
the crash of a process cannot be reliably detected. A failure model M that can be implemented 
and is indistinguishable from FS must be weaker than FS. However, it cannot be too weak; at the 
very least, a process i must not be able to determine that some process j executes an event after 
i detects that j has crashed. Furthermore, if a process detects the failure of i then t must crash 
at some point, and process crashes must have been able to occur in some total order. Hence, the 
following three conditions are necessary for indistinguishability :rom FS. 

Condition 1 For all runs r,ifr |= <C>FAILED t (j), then r (= <0>CRASHj. 
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Condition 2 The failed-before relation must be acyclic. That is, for all runs rand for all k, there cannot exist 

processes x \,X2, . . .,X k SUch that r f= FAILED I1 (j2)^ FA1L E D x 2 ( ;r 3)^' • •AFAILED r(t _ | (ifc) A FAILED^ (li). 

Condition 3 For all runs r, there cannot be an event e of process j such that failed, (j) — e in H r ■ 

Theorem 2 If failure model M is indistinguishable from FS, then all runs of M satisfy Conditions 1-3. 

Proof: 

Condition 1 In order for two runs to be isomorphic, their histories must contain the same events. 
For every rim r that satisfies FS, failed t (j) 6 Fi T =>crashj 6 TC t . Therefore, the same must be 
true of every rim that satisfies M . □ 

Condition 2 For contradiction, suppose that there is some run r of M such that r does not satisfy 
Condition 2. We show that there is no run r' satisfying FS that is isomorphic to r with respect 
to P. 

If r does not satisfy Condition 2, then there is some set of processes {zo> x i > • • • 5 x k } such that 
M t = (• • • failed Xo (x\) ■ ■ ■ failed K| (* 2 ) ••■••••• ■ failed Xk _ x {x k ) • • • failed Xk (x 0 ) ■ ■ ■). For any run 
r' satisfying FS, must contain crash Xt for all 0 < i < k. Furthermore, crash Xt must occur 
before failed ( x, ) and failed Xi (x ie i) must occur before crash Xt where 0 and 0 are - and + 
modulo k + 1 respectively. By transitivity, this leads to circular constraints onH T '- crash Xo 
must occur before failed Xk (x 0 ), which must occur before crash Xk , which must occur before 
failed*^ i (ijt)/ ■ • crash Xx must occur before failed i i ), which must occur before crash Xo . It 
is impossible to satisfy all of these ordering constraints in a valid run. Therefore, there is no 
run r' isomorphic to r that satisfies FS. □ 

Condition 3 For contradiction, suppose that there is some run r of M such that r does not satisfy 
Condition 3. We will show that there is no run r' satisfying FS that is isomorphic to r with 
respect to P. 

If r does not satisfy Condition 3, then 7i r = (• • • failed^j) • ■ • send,(k, rrik) • • • recvj(£, m ; ) • • • 
ej ■ • ■), where sendi(k, m*) — *• recoil, mj). For any r' isomorphic to r, TL t ' must maintain 
the order of failed^j), send t (k, m k ), and recvjft, mj) in order to satisfy the happens-before 
relation. However, for r' to satisfy FS, crash j must occur before failed^ j) in H r i. This means 
that in H t i, crashj must occur before recvj((, mj), which contradicts the definition of crashj. 
Therefore, there is no run r' isomorphic to r that satisfies FS. □ 

We have shown that Conditions 1, 2, and 3 are necessary for a failure model to be indistin- 
guishable from fail-stop. However, these conditions are not sufficient. 

Theorem 3 There exists a run r that satisfies Conditions 1-3 such that ->3 r' : r' =p rAr'g FS. 
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Proof : Let r be the following run: 

failed y {x);send y {a, m a ); recv a (y , m a ); crash a ;failed b (a);sendb(x,m x y,recv x (b, m x );crash x ■ • ■ 

For any r' isomorphic to r, we have the following ordering constraints on H T <\ 

• failed y (x) — * send y (a, m a ) — *■ reco a (y, m a ) — * crash a 

• failed b (a) -* send b (x , m x ) — * recv x (b , m x ) — crash x 

• crash x must occur before failed ( x ) 

• crash a must occur before failed b (a) 

It is impossible to satisfy all of these ordering constraints in a valid run. Therefore, there is no 
run r' isomorphic to r that satisfies FS. □ 

Theorem 3 implies that a failure model M that satisfies Conditions 1-3 may not be indistin- 
guishable from FS. In the next section, we give a set of conditions that are sufficient, though not 
all are necessary. 

33 Simulated Fail-Stop 

We give four properties that comprise a model that is indistinguishable from fail-stop. We call 
this model the simulated fail-stop model (sFS). 

To construct conditions for the sFS model, we weaken one of the conditions of the fail-stop 
model. Weakening FS1 yields a model in which some failures may be undetected. Under such a 
model, it could be impossible for a system to make progress. Therefore, we follow [CT91,CHT92, 
RB91] and weaken FS2. This yields a model in which nonexistent failures may be detected. 

FS1 is a liveness property. In a real system, it would be be implemented using timeouts: each 
process would periodically send a message to every other process. If process i were not to receive 
a message from process j within some predetermined length of time, then i would (perhaps 
erroneously) detect the failure of j. We assume for the remainder of this paper that there is some 
mechanism provided by the underlying system to implement FS1. 

We replace FS2 with the following four conditions: 

sFS2a: Vr, i, j: r [= □(FAILED, (j) =$> OCRASHj) 

This condition states that if process i detects that process j has crashed, then eventually j will crash 
even if i's detection was erroneous. In conjunction with FS1, this condition implies Condition 1: 
if failed^ j) occurs in H r , then crashj occurs in W r . 

sFS2b : The failed-before relation is always acyclic. 


8 



sFSl: FS1 

sFS2a: r f= □( FAILED, (j ) => OCRASHj ) 

sFS2b: The failed-before relation is acyclic. 

sFS2c: r |= □ -.FAILED, (i) 

sFS2d: r f= □[FAILEDj(j) A -.SEND, (A:, m) 

□ ((SENDi(A:, rn) A RECV*(z, m)) => FAILED*^))] 


Figure 1: Simulated Fail-Stop Conditions 


This is Condition 2. 

sFS2c: Vr, i: r [= □ -.FAILED,(i) 

This condition states that a process never detects its own failure. That is, failed ^(i) does not occur 
in H r - 

sFS2d : Vr, i,j,k: r |= □ [FAILED, (j) A -.SEND, (A:, m) => 

□ ((SEND;(A:, m) A RECV fc (t, m)) =>■ FAILED* (j))] 

This condition states that once i detects the failure of j, then any subsequent messages sent by i 
to any process k will not be received until k has also detected the failure of j. That is, if send,(k. m) 
occurs after failed^j) in 7i r , then failed k (j) occurs before recvk(i, m)mH r . 

Properties sFS2c and sFS2d together imply Condition 3, as shown in the following lemma. 

Lemma 4 If sFS2c and sFS2d hold in a run r, then there cannot be an event e of process j such that 
failed, -(j) — *■ e in Hr. 

Proof: Consider any run r. If i = j, then the lemma is trivially true, because from sFS2c, failed { (i) 
does not appear in H r . Assume that i ^ j. For contradiction, let e be an event of j such that 
failed { (j) -* e in 7i r . Since failed^]) and e are of different processes, from the definition of the 
happens-before relation there is a sequence of events failed^j) — ► sendi(ki , mjt, ) —*■ reco^ (t, mjt, ) — ► 

send kl (k 2 , m kl ) — ♦ recvj(k t , — e. From sFS2d, each process in this chain, including j, 

must have detected the failure of j by the time it receives its message. Therefore, failed -{j) must 
occur in 7i r , which contradicts sFS2c. D 

The sFS conditions are summarized in Figure 1. 

Theorem 5 The simulated fail-stop model (sFSJ is indistinguishable from the fail-stop model (FS). 
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The full proof of this theorem is given in Appendix A.2. An outline of the proof is given below. 

Consider a run r that satisfies FS1 and sFS2a*d but violates FS2. Then, there exists at least one 
pair of processes i and j such that r \= 0 (FAILED.; (i) A --CRASH,). For each such pair, by sFS2a, 
r |= 0 CRASH,. Therefore, H T = (• • • failed •(»') • • • crash • • •)• ft 0311 be shown that an event e can 
be moved within 7i r , resulting in H T ' such that r' =/> r,as long as the happens-before relation is 
maintained in H T <. We show in Appendix A.2 that -^{failed -(i) — *• crash), and that therefore, crash 
and all events e between failed -(i) and crash in ‘Hr such that e — ► crash can be moved to precede 
failed -(i) in H T >. Thus, if r satisfies sFS2a-d, then the events in H r can be rearranged so that crash 
precedes failed-(i) for all i,j in H r <. 


4 Lower Bounds 

The simulated fail-stop properties (FS1, sFS2a-d) put restrictions on the way in which failures are 
detected. Implementing these properties requires that processes follow a protocol for detecting 
failures. In this section, we give lower bounds on message complexity and replication for failure 
detection protocols implementing sFS. 

A one-round protocol for detecting a failure is one in which each process i exchanges one round of 
messages with other processes before executing failed^j). Any protocol simpler than a one-round 
protocol would allow at least one process to unilaterally detect the failure of some other process. 
Such a protocol, however, would limit which processes another process could detect as faulty. 
For example, suppose that process i can unilaterally decide that process j has failed. Process i 
can execute failed^ j) concurrently with any event of process j, and so process j can never execute 
failed -{i). Hence, we will consider the class of one-round protocols in order to determine message 
and replication complexity. 

We say that a process i initiates a failure detection protocol when it "suspects" the failure 
of another process j (e.g., due to a timeout at a lower level). In the first half of the round, 
process i sends a message to all other processes; in the second half of the round, processes send 
an acknowledgement message to i. We call the first message SUSP,j and the acknowledgement 
message ACK-SUSP.j . Upon completion of the failure detection protocol, i will execute either crash 
or failed -(j ) for some j ^ *. 

A one-round protocol that implements sFS must avoid cycles in the failed-before relation since 
all runs in sFS satisfy sFS2b. Implementing sFS2b requires that in any run there is at least one 
process that participates in all failure detections. To see why this is so, consider the problem of 
avoiding cycles involving exactly two processes. Suppose that process a suspects the failure of 
process b. Before a can execute failed a (b), the failure detection protocol must ensure that failed b (a) 
has not been executed and that failed b (a) will not be executed in the future. 

The failure detection protocol cannot require a to communicate with 6 directly, because b may 
have indeed crashed. Therefore, the protocol must require a to receive information from, and 
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distribute information to, other processes. In particular, a must receive information from enough 
other processes to be sure that failed b (a) has not been executed, and a must distribute information 
to enough other processes to be sure that if failed a ( b) is executed, then failed b ( a ) will not be executed 
in the future. 

The relevant information that a must disseminate is that a suspects the failure of b. In order 
for a to know that this information has been received by other processes, it must receive messages 
from other processes acknowledging that the failure of b is suspected. 

Definition 5 The quorum set Q XJ of failed, (j) is the set of processes from which i has received acknowl- 
edgement messages relating to its suspicion of j's crash. Formally, Qij — {k £ P : SEND, (k. SUSP,- j) A 
RECV,(fc, ACK.SUSP,,/)}. 

The set Q ab must be large enough to ensure that b, after hearing from Q ba , will not execute 
failed b (a). In particular, the sets Q ab and Q ba must have a non-null intersection. 

We call this property the Witness Property (W), because the quorum sets for any two failure 
detections must have at least one process (the witness ) in common. It can be shown that the same 
property must hold in order to avoid cycles of any size. The Witness Property can be stated 
formally as follows: 

w n Qa * 0 

Vt,j FAILED.(j) 

That is, there is some process w that is in the quorum set of all failure detections. Note 
that this is a stronger condition than what is necessary, for example, in the update of replicated 
variables [Gif79] in which only each pair of quorum sets must intersect. 

Theorem 6 (Vr: r |= □sFS2b) => (Vr: r |= D>V). 

It was argued above that (r |= DsFS2b) (r |= QW) if only cycles of size two are possible. 
The full proof of the theorem is given in Appendix A.3. 

Since sFS2b (Condition 2) is necessary for indistinguishability from FS (see Section 3.2), The- 
orem 6 implies that W is necessary for any one-round protocol that implements a failure model 
indistinguishable from FS. Let t be the maximum number of crashes in any run, including those 
that arise from erroneous suspicions. The necessity of the Witness Property places a constraint on 
t as a function of n and on the number of messages that a process must wait for before detecting 
a failure. 

The simplest way to ensure that W holds in a one-round protocol is to require a process to wait 
for responses from every other process, except for those that are suspected to have failed, before 
detecting a failure. If there is always at least one process that never fails, nor is suspected of failing, 
then this process will be a witness to every failure detection that is executed. This implementation 
only requires that t < n. However, if n is large and t is small, then each failure detection requires 
a process to wait for many messages, which in practice could take a long time. 
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An alternative implementation is to require a process to wait for a fixed, predetermined number 
of responses before detecting a failure. This approach reduces the size of the quorum for which a 
process must wait, but it places a stronger restriction on the number of failures that can occur. 

Theorem 7 If the size of the quorum set isa fixed and equal size for each failure detection, then to guarantee 
that r \= □ W when t failures are possible, the size of each quorum set must be strictly greater than n( b=l ). 

Proof : We assume that in any run, no more than t failures will occur. Therefore, the largest possible 
cycle in a run satisfying (simulated) fail-stop involves t processes. We must guarantee that any t 
quorum sets Q\ Q t have a nonempty intersection. 

Let the size of a quorum be x. Let y = n — x. Suppose y — |" j] . Then there is a set 
of t quorum sets such that V* € P : 3 j : i Qj. In particular, let Q\ — P — {1,2, 

Q 2 = P - {y + l,y + 2,...,2y), = P - {n - y + 1 , n - y + 2, . . n}. By construction, 

t 

each process is not a member of at least one quorum. Therefore, P| Qi = 0. Clearly, such a set of 

.=1 

quorum sets can also be constructed if y > fy"| - Therefore, we must have y < [j]. 


x = n — y 


x > n_ iyi 

nt - n 

* > L— r-J 


i > [ 


n(/-l) 


J 


Therefore, the size of a quorum must be an integer strictly greater than n( CJ-). □ 

Corollary 8 If the minimum quorum size is used in a one-round protocol for failure detection, then it must 
be the case that n > t 2 . 


Proof : In a one-round protocol, the size of the quorum is equal to the number of ACK.SUSP ( J 
messages that process i must receive before executing failed t (j). Since i is in its own quorum, i 
must wait for [n( L=J.)J messages before detecting j's failure. In order for the one-round protocol 
to make progress, at least this many other processes must remain alive. Therefore, we have 


■ A ~ 1m 

n-t > [n( - )J =► 

=*> 

=> 


, » , ^ , n 

n-t>[n-j\>n-[j\ 

i » i 

<<[ T J 

t 2 < /[yj < n 

t 2 < n 


□ 
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5 Upper Bounds 

We give a simple one-round protocol that implements sFS2a-d. We assume that a failure can be 

suspected spontaneously (e.g., due to a timeout), but that no more than i failures are suspected in 

any run. In this protocol, SUSP,,, = ACK.SUSP.j = "j failed". 

• When process i suspects the failure of process j, i sends the message "j failed" to all processes 
(including itself). Process i waits for messages of the form "j failed" from other processes 
and takes no other action except for acknowledging "x failed" messages until it completes 
the protocol or crashes. 

• When process i has received messages of the form "j failed" from more than n( l -=±) processes 
(including itself), i executes failed t (j). 

• When process x receives a message of the form “x failed", x executes crash x . 

• When process i receives a message of the form "y failed", i suspects the failure of y. 

We will argue informally that this protocol implements the simulated fail-stop properties. 

sFS2a: Process i cannot execute failed t (j) without sending a message of the form "j failed" to all 
other processes, including j. Since channels are nonfaulty, j will eventually receive such a 
message, upon which j will crash. 

sFS2b: The full proof is given in Appendix A.4. We give an outline of the proof for cycles of length 
2. Suppose that the protocol generates a run r such that r |= £>(failed,(j) A FAiLED, (i)). By 
Theorem 7, r J= □ W holds. Therefore, there is some witness w such that i received "j failed" 
from w and j received "i failed" from w. Process w sends these messages to all processes. If 
w sends "j failed" before it sends "i failed", then process j will receive “j failed" and crash 
before it can execute failed -(i). Similarly, if w sends "i failed" before it sends "j failed", then 
process i will receive "i failed" and crash before it can execute failed i (j). Therefore, it is not 
possible for both failed^j) and failed ^i) to be executed in a run. 

sFS2c: Process i cannot execute failed x (i) without receiving at least one message of the form "i 
failed". Upon receiving such a message, i crashes. Therefore, failed t ( t ) is never executed. 

sFS2d: Since channels are FIFO, any message m sent by i to k after failed^ j) is executed must be 
received after the message " j failed". Upon receiving " j failed" from i, process k suspects 
the failure of j and initiates the failure detection protocol. Process k does not receive m 
until either crashk or failed k (j) is executed. Therefore, message m is not received by k unless 
failed k (j) has been executed. 
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6 Discussion 


In Section 3.2, we showed that Conditions 1, 2, and 3 are necessary for any failure model to be 
indistinguishable from the fail-stop model. In Section 4, we showed that the Witness Property 
is necessary for any one-round protocol implementing Condition 2. We then showed that the 
Witness Property imposes lower bounds on the number of messages that must be received before 
a failure can be detected and on the number of failures that can be tolerated in a system. 

We gave a protocol in Section 5 to demonstrate that these bounds are tight. This protocol, 
however, was derived from conditions that are not necessary for indistinguishability. There may 
be a failure model weaker than sFS that is indistinguishable from FS. However, such a failure 
model is subject to the same bounds on t as sFS, and so we do not expect such a failure model to 
be substantially more interesting than sFS. 

The bounds on t arise from sFS2b. A failure model satisfying only the other sFS assumptions 
would not require a process to wait for any messages before detecting a failure: the other sFS 
properties can be implemented simply by having process i broadcast a message "j failed" after 
suspecting j's failure and before unilaterally executing failed {(j ). Such a failure model would, of 
course, be distinguishable from FS, but if a collection of processes are insensitive to cyclic failures, 
then they could be run in this cheaper simulated failure model. We do not know of any protocols 
in the literature that are insensitive to cyclic failure detection, however. 

As an example of sensitivity to sFS2b, consider the problem of determining the last process to 
fail ([Ske85]). Solving this problem requires that processes record information about the failures 
that they detect (that is, their view of the failed-before relation). Then, when processes are 
recovering after a total failure, the recovering processes can determine when the last processes to 
fail have recovered. If cyclic failure detection is possible, then the problem is not solvable. For 
example, suppose P = {1,2}, process 1 falsely detects 2's failure, and then crashes. Process 2 
detects l's failure, proceeds with its work, and finally crashes. If process 1 were to then recover, it 
would conclude that it was the last to fail. In general, if cyclic detection is possible then the only 
possible recovery is to always wait for all crashed processes to recover. 

There are other protocols that require failure models even stronger than FS. For example, if 
the failed-before relation is transitive as well as acyclic, then detecting the last process to fail can 
be implemented so that as soon as the last processes to fail have recovered, then the processes can 
determine this. If the failed-before relation is not transitive, then it is necessary to wait for more 
processes to recover. The failed-before relation of sFS is not transitive. We are currently looking 
into several stronger versions of fail-stop, whether they are implemen table given fail-stop, and 
into how they too can be simulated. 

The protocols described in this paper are very simple and are easily implementable. Failure 
detection such as described here is typically done as part of a group membership service (e.g., 
[RB91,MPS91,ADKM92]). We believe that the protocols here could be used as the basis of a failure 
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detector that could be used outside of a system built using a group-membership protocol. This 
would allow for consistent failure detection on top of any kind of lower-level communication, 
including point-to-point communication. 
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A Appendices 

A .1 Formal Definition of Events, Runs, and Special Predicates 

Recall that an event e is a function that maps global states to global states. An event e applied to a 
global state E either 

• yields E, in which case we say that e is a null event; or 

• yields a new global state S' that differs from E in the local state of exactly one process i and 
the state of at most one channel incident on i. We say in this case that e is an event of i, and 
that e changes the state of i. 

A non-null event e is uniquely defined by the process i whose state it changes, the state s of i 
immediately before e is applied, the state s' of i resulting from e, the states of the channels incident 
on i before e is applied, and the states of the channels incident on i after e is applied. Let A, j be 
the state of channel C t J . Let A,> be the n-tuple (A,-,!, Ai t 2, . . A,- >n ) and let A» )f be the n-tuple 
{A 1 !,,-, X2,i , . . . , A„ t i).Then, e is defined by the 7 -tuple (i, s , s', A,-,., Aj„, A, t ,, A' ,), such that: 

• if A, t * ^ A'„ (e is a send event), then A',,,- = A',-, there exists exactly one j ^ i such that 
A 'ij ^ X\ ■, and X' i } = (A :: m) for some message m (where :: is the catenation operator). 


• if A„ it - / A' , (e is a receive event), then X^ m = Aj„ there exists exactly one j 7^ i such that 
Xj t i A'„ and (m :: A' ,) = A Jit for some message m. 

If e is a null event, then e is not of any process i and therefore is not represented by a 7 -tuple. 

Definition 6 We say that (non-null) e = (i, s, s', A, t ,, A'J t „,A'* t j, A. ti ) can occur in global state E if and 
only if: 

• the state of process i in E is s, 

• the states of the incoming channels incident on i in E are A, ? „ and 

• the states of the outgoing channels incident on i in E are A 
A null event can occur in any state. 

Let e = (i, s, s', A, t „ A' ,, A, ,, A'„ (i ). We abbreviate send and receive events as follows. 

• If e is a send event of i and C[ ■ = (C, j :: m) for some j, then e is denoted sendi(j, m). 

• If e is a receive event of i and (m :: Cj,) = Cj |t for some j, then e is denoted recv^ j, m). 

We define "crash" events and "failure detection" events as follows: 
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• If crash, is false in s and true in s', then e is denoted crash. By assumption, crash changes 
only the local variable crash, . 

• If 3 j: failed, (j ) is false in s and true in s', then e is denoted failed^ j). 

The events send , ( j, m ), recvi(j, m ), crash , and failed ( j ) are atomic; each event only changes the 
relevant state variables of the process on which it occurs. For example, if crash, is false in local 
state s of i when send x (j, m ) occurs, then crash, is false in the resulting state of i. 

Definition 7 Let r = (So, Sj, £ 2 , ■ ■■) be an infinite sequence of global states of the system. We say that 
r is a run of the system if and only if £0 ts an initial global state and there exists a sequence of events 
(e 0 , e\ ,e 2 ,.. .) such that for all i > 0, e, can occur in £, and £,+i = «;(£,■)• 

• Vi, j, m: SEND ,(j, m) and RECV,( j, m) are false in an initial global state. 

• Lete = send x (j, m ) and let £ be a global state such that e can occur in £. Then send,(j,m)(£) f= 
SEND,(i, m). That is, SEND,( j, m) becomes true when send x (j, m ) has occurred. 

• Let e = recvi(j, m ) and let £ be a global state such that e can occur in £. Then recv, ( j, m)( £ ) |= 
RECV,(j, m). That is, RECV,(j, m) becomes true when recvi(j, m) has occurred. 

A.2 Proof of Theorem 5 

Theorem 5 The simulated fail-stop model (sFS) is indistinguishable from the fail-stop model (TS). 

In order to prove that for any run r that satisfies FS1 and sFS2a-d, there is an isomorphic run 
r' that satisfies FS1 and FS2, we will need to determine the conditions under which an event in a 
history 7i r can be moved to yield a history V. T > such that r =p r'. 

Consider n T = (.. .e,, e i+1 , e, +2 • • •) corresponding to run r = (...,£;, £ i+1 , £; +2 , •• •)- By 
definition, e, can occur in £, and e, +1 can occur in e,(£,) = £,+i. Assume that e, and e, + i are 
non-null events. 

Suppose that e, and e, + i are of the same process k. Since e, changes the state of k, the state of 
Jfc is not the same in £, as in £, +1 . Therefore, e, + i cannot occur in £,. 

Now suppose that e, and e, + i are of two different processes k and i, respectively. The state of 
l in £i is the same as that in £, + i, because e< does not change the state of C. Therefore, if e,+i is not 
a receive event, then e,+i can occur in £, . If e, + i is a receive event, and changes the state of any 
incoming channel other than Ck,t, thene, +t can occur in £, , because the states of all other incoming 
channels must be the same in £< and £,■+]. However, if e, + i = recv/(k, m) and e; = sendk(t, m), 
then e, +1 cannot occur in £„ because the message m is not part of A' k ( in £,. 

In summary, e ,+ 1 cannot occur in £,• if and only if 

• e, and t , are of the same process, or 
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• e, — sendk(( , m) and e^+i = recve(k, m). 

In other words, e;+i cannot occur in Si if and only if (e, — e,+i ). 

Assume that Ci+i can occur in E,, and let S' +1 = e, + i( E t ). It can be shown by a similar 
argument that e,+i cannot change the state of k, Xk.t, or X(k in such a way as to violate the 
preconditions for e„ so e, can always occur in E' +1 . Furthermore, e,(e,+i(E,)) = e,+i(e,(E t )). 
Therefore, r' = (. . . E,, EJ +1 , E, +2 , . . .) is a valid run, where 7i T > - (■ . -e.+i, e,-, e,+ 2 • • •)• 

Consider ^ and r^ k ( y (Recall that repeated states are removed in these sequences.) From 
the construction of r', = r' k and 77 = r' ( . Since e; and e t+ i do not change the states of processes 

other than k and l, r t = r[ for all process t £ {£,£}. Therefore, r =p r’. 

In summary, we have shown that if ->(e, — e,+i) in H r , then e,+i can be moved before e, to 
yield H r < such that r’ =p r. It can also be shown that for any two events e, and ej in H T such 
that z < j and -i(e, -»• e ; ), ej can occur in E „ e, can occur in ej(E,), and ej(ei(E,)) = ei(e_,(E,)). 
Therefore, ej can be moved to directly before e, to yield 7 i r ' such that r =p r' . 

We can now prove the theorem. 

Proof: If run r satisfies FS2 then the theorem trivially holds, so we assume that r violates FS2. 
Then, there exists at least one pair of processes i and j such that r (= <^(FAlLEDj (i ) A -’CRASH, ). For 
each such pair, by sFS2a, r (= <0>CRASH,. Therefore, H T is of the form (• • • failed -{i) ■ ■ • crash , « • ■ •)■ 

Definition 8 A pair of processes ( i,j ) is bad in H r ifTir = (• • -failed^') • • -crash, • • •). Otherwise, 
(i,j) is good in H T . 

We prove the theorem by induction on the number of bad process pairs in 7 i r . 

Base case Assume that there is only one bad pairing. Let H T = (x; failed -(i);y;crashi;z) where 
x, y, and z are sequences of events. Let k be the number of events in y. We construct by induction 
on fc a run r' isomorphic to r such that W r < = ( x 1 ; crash,; failed -(i); y'; z) where x' and y' are sequences 
of events. 

Base Case (Inner Induction) Assume k = 0. H r = (x; failed -(i); crash,; z). Since 
crashi and failed ^j) are of different processes, they can be swapped to yield H r ' = 

(x; crash;; failed At); z) such that r' =p r. Clearly, r' satisfies FS2. 

Induction case (Inner Induction) Assume that the theorem holds for all histories 
in which k < £ — 1, and assume that k = l. H r = (*; failed -(i); e\; e 2 ; • • •; ep, crash p, 
z). By Lemma 4 we know that ->(failedj(i) — * crash ,). Let e u be the first event of 
(ei;- • ■; crash;) such that -^(failed j(i) — e u ). Since e u is the first such event and — is 
transitive, Vx : 1 < x < u : ->(e x — e u ). Let Q C P be the set of processes such that 
failed -(i),ei , . . are events of processes in Q . Then e u is an event of a process in Q. 

Therefore, there is a history 7f r » = ( x; e u ; failed 3 (i);e i; e 2 ; • • • ; e u _i ; e u+ i ;•••;£/; crash ; ) 
such that t" =p r. By the induction hypothesis there is a history H T > of the desired 
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form such that r' =p r" , and hence r' =p r. 


^ Inner Induction 

Induction case Assume that there are k bad pairs in Ti r , one of which is (x, y). We will show 
that we can use the same inductive construction presented in the Base Case to yield a history Tip, 
such that r' = P r, with strictly fewer bad pairs, so that the Inductive Hypothesis applies to Up . 

Overview: Given a bad pair (x, y), consider another pair of processes (a, 6). Using a case 
analysis on all possible placements of failed b (a) and crash a in H r with respect to failed y (x ) and 
crash?, we show that using the earlier inductive construction, we can "fix" (x, y) — i.e., construct 
a history Up in which (x, y) is good — such that: 

• if (a, 6) is bad in H r , then (a, b) is either good or bad in Tip} 

• if (o,6) is good in 7i r , then (a, 6) is either still good in 7 i r >, or is bad in Tip but can be 
fixed without making (x, y ) bad again by using a finite number of applications of the same 
inductive construction. 

There are twelve possible placements of failed b (a) and crash a with respect to failed y (x ) and 
crash?. In each case, we consider the effect on (a, b) of applying the inductive construction to 

1. • • • crash a • • • failed b (a) ■ • • failed y (x) • • • crash? - ■ • 

2. • • • failed b (a) • • • crash a • • • failed y (x) • • • crash ? • ■ • 

3. ••• failed y {x) crash? ••• crash a ••• failed b (a) ••• 

4. • * • failed y (x ) • ■ • crash? • • • failed b (a) • • • crash? • • • 

5. failed b (a) ••• failed y (x) ••• crash? crash a ••• 

6. ••• crash? ••• failed y (x ) ••• crash? ••• failed b (a) 

Since only events that occur between failed y (x) and crash? are moved, (a, 6) is independent 
of (x, y) in these six cases, in that fixing (x, y) has no effect on the goodness of (a, 6). Thus, 
(x, y) becomes good and (a, 6) is unchanged. 

7. ••• failed b (a) ••• failed y (x) ••• crash a ••• crash? 

In this case, the history Tip resulting from an application of the construction of the base case 
has one of two forms, depending on whether or not failed y (x) — * crosh a : 

• Up = (• • ■ failed b (a) ■ • -crash? ■ - -crash?;, failed y (x) • • •) 
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• fir' = (• • • failed b (a ) ■ ■ - crashx) failed ( x ) • • • crash a - - •) 


In either case, (x, y) is now good and (a, 6) remains bad. 

8. ••• failed y (x) ••• crash a ••• crashx ••• foHed b (a) ••• 

In this case, the history H r > resulting from an application of the construction of the base case 
has one of two forms: 

• Ji r ‘ = (• • • crash a • - - crashx}} ailed (x) • • • failed b (a ) • • •) 

• = (• • - crashx ; failed v (x ) ■ - • crash a ■ - ■ failed h (a) ■ ■ •) 

In either case, (x, y) is now good and (a, b ) remains good. 

9. ••• crash a ••• foil ed y (x) foiled b (a ) ■■■ crashx 

In this case, the history Ti T ' resulting from an application of the construction of the base case 
has one of two forms: 

• ‘Hr> = (• • • crasha • • -foiled^a) • • -crashx} foiled y (x) ■ ■ •) 

• H r ' = (• • • crash a ■ ■ - crashx} foiled y (x) • • • foiled b (a ) • • •) 

In either case, (x, y) is now good and (a, b) remains good. 

10. foiled y (x) ■■■ foiled b (a) ••• crashx ■■■ crash a 

In this case, the history H r < resulting from an application of the construction of the base case 
has one of two forms: 

• H r ‘ = (• • • foiled b (a ) • • - crashx} foiled y (x) • • • crash a ■ • •) 

• H t ' = (• • • crash x ; failed. ( x ) - - - foiled b (a ) • • -crasha ■ • •) 

In either case, (x, y) is now good and (a, b) remains bad. 

11. ••• foiled y (x) ■■■ foiled b (a ) ••• crash a crashx ■■■ 

In this case, the history H T > resulting from an application of the construction of the base case 
has one of four forms: 

• H t > = (• • • foiled b (a ) • • • crash a • • ■ crashx} foiled y (x) • • •) 

• H t ' = (• • -foiled b (a) - - - crashx} failed (x) • • - crash a ■ - ■) 

• H r ' = (■ • - crashx} f ailed y (x)- • -foiled b (a) - --crash a ■ ■ •) 

• Hr' = (• • • crasha ■ • • crashx} failed y (x) - • -failed b (a) ■ ■ •) 

In the first three cases, (x, y) is now good and (a, b) remains bad; in the fourth case, (x, y ) is 
now good and (a, b) is now good, thus reducing the number of bad pairs by two. 
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12. ••• failed y (x ) --- crash a failed b (a) •• • crash r ••• 

In this case, the history H r ' resulting from an application of the construction of the base case 
has one of four forms: 

• H r > = (• • -crashx} failed y (x) ■ crash a ■ - -failed b (a) • - •) 

• Hr' = (• • • crash a - • • failed b (a ) • • - crashx} failed y (x) ■ • •) 

• H r < = (• • -crasha - - - crashx} failed y (x ) • • • failed b (a ) ■ • •) 

• H t > = (■ • -failed b (a) • • - crashx} failed y (x) - - - crasha • • •) 

In the first three cases, (x, j/) is now good and (a, 6) remains good. However, in the fourth 
case, (x, y) is now good, but (a, b) is now bad. Thus, the number of bad pairs may not be 
reduced. Furthermore, for each pair (i, j) such that failed .(*) and crash, appear in H r in the 
same order with respect to failed y (x) and crashx as failed b (a) and crash a , there can be one more 
bad pair in H r < than there is in H r . 

However, we can construct a history H r » from H r > in the same manner in which H r < was 
constructed from Hr, such that (a, b ) is good in H r » and (x, y) remains good in H r » as follows. 

We have H T > = (• • -failed b (a) - - ■ crashx} failed y ( x ) • • -crasha • * •)■ Recall that in the construc- 
tion of H r > from H r , an event e between crash x and failed y {x) was moved if and only if 
->(failed y (x) — e). Therefore, since failed b (a) was moved in the construction of H T > and crash a 
was not, it must be the case that in both H r and H r > 

^(failed y {x) — failed b (a )) A (Jailed y (x) — crasha ) (1) 

As shown in the case analysis, there are four possible results of applying the inductive 
construction to H r >. Either of the first three possibilities yields a history H r » in which (a, b) 
is good and (x, y) remains good. We claim that the fourth possibility cannot occur. 

Proof: Suppose, for contradiction, that H r » — (• • -failed y (x) - - ■ crasha} failed b (a) - • -crash x • ■ •). 
Then by the earlier argument it must be the case that in H T < and H r » 

- y(jailed b (a ) — failed y (x)) A (failed^a) -*• crashx) (2) 

( failed y (x ) -» crash a ) in H r > implies that failed a (x) occurs in H r > by sFS2d and the definition 
of happens-before. Similarly, (failed b (a) — ► crash x ) implies that failed x (a) occurs in H r <- 
Thus, Equations 1 and 2 imply that in H r > both/fli/ed a (x) and failed x (a) occur in H T >, which 
contradicts sFS2b. Therefore, H r » cannot have the assumed form, so both (a, b) and (x, y) 
must be good in H r ». 

Thus, if fixing (x, y) in H r results in t new pairs (a,, 6,) that are bad in H r ', then we can fix 
all of these pairs in t applications of the inductive construction. (Note that the t bad pairs 
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do not interfere with each other: since all of them are bad, they all fall under one of the first 
11 cases. Therefore, fixing one pair (a,, 6.) either fixes another pair (a v bj ) or does not affect 

(a,j, bj).) 

Thus, the number of bad pairs in fl T can be reduced by at least one in some finite number 
of applications of the inductive construction given in the base case. Furthermore, this number is 
bounded by n. 

Therefore, we can construct a history H r ' with fewer than k bad pairs such that r' =p r. From 
the Induction Hypothesis, there is a run r" that satisfies FS2 such that r' =p r"; therefore, r =p r". 

□ 


A3 Proof of Theorem 6 

Theorem 6 (Vr: r f= DsFS2b) => (Vr: r (= DW). 

We will show that (3r : r |= => (3r : r \= -><)sFS2b). To do this, we first assume that 

W does not hold in some state of r, i.e., that it is possible for k failures to be detected such that 
the quorum sets for those detections have an empty intersection. We then show that using this 
assumption, a run can be constructed in which there is a fc-cyde in the failed-before relation. 

We divide the n processes in Pinto k sets So, •••> Sfc-i such that for 0 < i< k — l,i€ S;; that is, 
processes 0 through k — 1 are in sets So through Sk-\, and the rest of the processes are distributed 
among So through Sk-\. 

Consider the following scenario. For all i : 0 < i < (k — 1): 

1. Process i suspects the failure of process i © 1, and sends the message SUSP, it ©i to all processes 
in P. The messages sent to the processes in set 5,©i are delayed indefinitely. 

2. As a result of Step 1, process i receives a message SUSPj@ij from process j © 1 for all 
j h 0 < j < k — 1, where 0 is subtraction modulo k. Thus, process i does not learn that 
another process has suspected it of having crashed. 

3. Before receiving SUSPjgij, process i suspects the failure of process j, and sends SUSP tJ to 
all processes in P. The messages sent to the processes in set 5,©i are delayed behind the 
previous messages (recall that interprocess channels are FIFO). Process i also acknowledges 
any SUSP messages with ACKSUSP messages. 

4. Process i has now received ACK.SUSPfc,,©i messages from all processes k in (J Sj. 

j&e i 

Let <5,,i©i = [J Sj for all * : 0 < i < k — 1. No process in Sj is in Qj,j® i; in other words, for every 
j/»e i 

jt - 1 

process i in P, there is some quorum set of which i is not a member. Therefore, P| Qij ® i = 0. 

i= 0 
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Furthermore, by definition of Qij being a quorum, every process i has received enough ACK.SUSP 
messages to execute failed We have failed 0 (1), . . ,,failed^ k _ 2 ^(k - 1), andfailed^ k _^(0), which 

causes a £ -cycle in the failed-before relation. □ 


A.4 Proof that the Protocol of Section 5 Implements sFS2b 

Lemma 9 Given the protocol of Section 5, then [r (= 35 = {1,2, (FAILEDi(2)AFAILED2(3)A- • -A 

FAILED*- 1 (*))] => [3 q : (send, (5, "k failed ") — send, (S, "k - 1 failed") — * send ,(S, "2 failed")) 

in Hr). 

Proof: We use the notation SEND, (5, to) as shorthand for (Vp e S : SEND,(p, m)). 

The size of the quora are sufficient to ensure W, by Theorem 7. By W, r (= 3? : Vi, j e S : 
FAILED, (j ) =>• recv,(^, “j failed") => SEND, (5, "j failed"). We prove the lemma by induction on k. 

Base case For k = 2, the proof is trivial. Let k = 3. S = {1,2,3}, r ^ FAlLEDi(2) A 
FAILED2(3), and r |= SEND, (5, "2 failed") A SEND, (.S', "3 failed"). Assume for contradiction that 
send q (S, “2 failed") — + send q (S, "3 failed") in H r . Then, because channels are FIFO, recojiq, "2 foiled") — ♦ 
recv 2 (q, "3 failed") in H T - By the protocol, crash 2 — *• failed 2 {3) in H r , so r (= ->FAILED2(3). Therefore, 
it must be the case that send q (S, "3 failed") — ► send q (S, "2 failed"). 

Induction case Assume that the lemma is true for k = / - 1. For k = l, we have FAILEDi(2) A 
FAILED2(3) A • • • A FAlLED/_i(f). By the induction hypothesis, send, (5, "l — 1 failed") 
settd q (S, “2 failed") in H r . Assume for contradiction that send, (5, "/ — 1 failed") — * send q (S, “l failed") 
in Hr- Then, as in the base case, rra?/_i(g , " l — 1 failed") — ► recvi-\(q, "l failed"), so crashi _i — *■ 
^w'/ed,_ 1 (/) in H r and r )= -iFAILED/_i(/). Therefore, send, (5, "l failed") — *■ setid q (S, “l - 1 failed") 
in Hr- □ 

The quorum size for each failure detection is sufficient to guarantee W. Assume for contra- 
diction that the failed-before relation is not acyclic. Then r f= 35 = {1, . . . , it} : FAlLEDi(2) a • • ■ A 
FAILED jfe_i(Ar) A FAILED jt(l). By Lemma 9, 3 q : send q (S, "1 failed") — ► send, (S, "k failed") — •••—*• 
send,(5, "2 f*iiled") in 7f r . Thus, recv\(q, "1 failed") — ♦ reco\{q, "2 failed") in H T , crash\ — *■ failed ,(2) 
in Hr, and r (= -iFAlLEDi(2). □ 
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